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Introduction 

From  a  systems  engineering  point  of  view,  cells  consist  of  a  complex  set  of  nested,  nonlinear  control 
systems  dominated  by  feed-back  and  feed-forward  loops.  The  more  complex  the  system  is,  the  more  im¬ 
portant  are  the  issues  concerning  the  robustness  and  parameter  optimization,  therefore  modeling  and 
simulations  are  important  for  both  engineering  and  reverse-engineering  of  biosystems. 

Objective 

At  the  functional  level,  all  biological  processes  in  cells  can  be  represented  as  a  series  of  biochemical  re¬ 
actions.  Since  such  reactions  are  stochastic  in  nature,  the  user  must  run  thousands  of  simulations  to  char¬ 
acterize  the  “ensemble”  behavior  of  biological  systems.  We  developed  a  software  package  called  Bio- 
molecular  Network  Simulator  (BNS)  to  model  and  simulate  complex  biomolecular  reaction  networks 
using  High  Performance  Computing  (HPC). 

Methodology 

The  Biomolecular  Network  Simulator  uses  the  Gillespie  stochastic  algorithm  to  simulate  the  evolution 
of  a  system  of  biochemical  reactions.  The  BNS  code  is  a  combination  of  MATLAB  and  C-coded  mod¬ 
ules.  This  combination  allows  one  to  use  the  interactive  features  and  visualization  tools  of  MATLAB, 
while  achieving  high  speed  for  the  computationally  intensive  part  of  the  software.  The  software  is  paral¬ 
lelized  with  the  MPI  library  to  run  on  multiprocessor  architectures. 

Result 

The  Biomolecular  Network  Simulator  consists  of  two  sets  of  tools:  for  simulations  of  the  system  and  for 
the  analysis  of  simulation  results.  The  Graphical  User  Interface  of  BNS  allows  users  to  easily  set  pa¬ 
rameters  for  the  model  and  simulations  and  to  select  analysis  method.  Multiple  types  of  post-simulation 
analyses  are  available. 

Significance  to  DoD 

The  Developed  software  allows  DoD  scientists  to  build,  simulate  and  analyze  complex  cellular  bio¬ 
molecular  networks  utilizing  the  capacities  of  HPC.  It  provides  the  foundational  capability  to  design  and 
integrate  biological  constructs  into  a  new  generation  of  biotechnology  products. 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 
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1  INTRODUCTION 


All  chemical  reactions  require  the  interaction  of  the  reactants  with  sufficient  energy  to  overcome  the  ac¬ 
tivation  energy  barrier  and,  fundamentally,  they  are  stochastic  in  nature.  The  best  way  to  model  kinetics 
of  a  system  of  chemical  reactions  is  to  use  a  stochastic  approach  in  terms  of  the  Chemical  Master  Equa¬ 
tion  (CME),  with  the  number  of  molecules  of  each  molecular  species  as  variables.  The  CME  describes 
transitions  of  the  system  from  one  state  to  another  state  based  on  probabilistic  methods.  Gillespie  pro¬ 
posed  a  method  to  simulate  probabilistically-correct  trajectories  based  on  the  CME  through  the  use  of 
Monte  Carlo  methods  [1], 

Although  the  Gillespie  stochastic  algorithm  is  a  method  for  exact  simulations,  as  it  explicitly  counts 
each  reaction  event  that  occurred  in  the  system,  the  accuracy  of  the  method  comes  at  a  high  computa¬ 
tional  cost.  This  is  especially  true  for  systems  with  a  large  number  of  molecular  species  where  reactions 
occur  numerous  times  in  a  short  period  of  time.  To  accelerate  discrete  stochastic  simulation,  Gillespie 
proposed  the  tau-leaping  method  as  an  approximate  simulation  strategy  [2],  By  using  Poisson  random 
numbers,  the  tau-leaping  method  can  leap  over  many  reactions  without  a  significant  loss  of  accuracy. 

We  developed  a  software  package  -  the  Biomolecular  Network  Simulator  (BNS)  -  that  can  use  the 
Gillespie  stochastic  algorithm  or  the  tau-leaping  algorithm  to  simulate  the  behavior  of  a  system  of  bio¬ 
chemical  reactions.  It  allows  scientists  to  build  a  synthetic  biomolecular  network  and  explore  its  per¬ 
formance  utilizing  the  capacities  of  High  Performance  Computing.  BNS  contains  tools  for  both  simulat¬ 
ing  the  system  and  for  analyzing  the  results  of  the  simulation.  Since  the  simulations  are  stochastic  in 
nature,  the  user  must  run  thousands  of  simulations  to  characterize  the  “ensemble”  behavior  of  biological 
systems.  The  parallelized  BNS  code  allows  users  to  run  multiple  simulations  and  store  results  on  multi¬ 
processor  platforms.  In  this  paper,  we  present  a  brief  description  of  the  Biomolecular  Network  Simulator 
software  along  with  some  examples. 

2  STOCHASTIC  SIMULATION  ALGORITHM 

Let  us  consider  a  system  composed  of  N  well  mixed  chemical  species,  5/  (/  =  1,...N),  in  a  fixed  volume 
V,  which  are  involved  in  M  reactions,  Rj  ( j  =  1,...M).  The  dynamical  state  of  the  system  can  be  speci¬ 
fied  by  the  state  vector  X(t)  =  (Xi(t),  X2(t),...XN(t)),  where  Xj(t)  is  the  number  of  molecules  of  species  Si 
at  time  t. 

The  probability  for  each  reaction  R,  is  defined  by  a  propensity  function  cij  (X)-  q  -Hj(X) ,  where  q  is 
the  stochastic  probability  constant  and  Hj  (X)  represents  the  number  of  possible  combinations  of  reac¬ 
tants.  The  probability  that  reaction  Rj  will  occur  in  the  time  interval  dt  is  defined  as  cij(X)dt.  Each  reac¬ 
tion  is  also  characterized  by  its  state-change  vector  Vj  =  (vq,  i’2j ,  . . .  vNj),  where  vy  is  the  change  in  the 
number  of  molecules  Si  caused  by  one  Rj  reaction  . 

To  study  the  evolution  of  the  state  vector  X(t),  Gillespie  proposed  an  algorithm  for  Monte  Carlo 
generation  of  stochastic  trajectories  [1].  The  direct  simulation  algorithm,  which  is  implemented  in  the 
Biomolecular  Network  Simulator,  answers  two  questions:  (1)  which  reaction  will  occur  next,  and  (2) 
what  is  the  waiting  time  for  the  next  reaction  to  occur. 

To  answer  these  questions,  two  random  numbers  uniformly  distributed  over  the  interval  (0,1)  -  r/ 
and  r?  -  are  generated.  The  first  random  number  is  used  to  determine  the  next  reaction  Rj ,  such  that 

j- 1  J 

Tjai<rl-a0<Yjai, 

i= 1  i=l 


(1) 


where 


M 

a0=Yjaj-  (2) 

The  distribution  of  the  waiting  time  is  given  by  following  probability  density  function: 

P(T,  j )  =  aj  exp (~a0T) .  (3) 

Here,  P(r,  j )  is  the  probability  that  the  waiting  time  for  the  reaction  is  T  and  that  it  will  be  an  R,  reaction. 
The  waiting  time  for  the  next  reaction  is  calculated  as  [1] 

t  =  —  log—  ■  (4) 

a  o  r2 

After  the  next  reaction  and  its  waiting  time  are  determined,  the  reaction  is  executed  and  the  state  of 
the  system  is  changed  according  to  the  state-change  vector  Vj.  The  simulation  time  is  increased  then  by  t 
and  the  next  simulation  step  is  generated. 

The  Gillespie  stochastic  algorithm  is  exact  for  the  elementary  reactions  (uni-uni,  uni-bi  and  bi-uni 
types  of  reactions)  with  any  number  of  molecules  in  the  system.  For  the  system  with  large  numbers  of 
molecules,  the  trajectories  generated  by  stochastic  Monte  Carlo  simulations  converge  to  the  trajectory 
generated  by  deterministic  differential  equations  [1]. 

3  TAU-LEAPING  ALGORITHM 


The  tau-leaping  method  calculates  a  time  interval  x  which  encompasses  more  than  one  reaction  event 
and  satisfies  the  Leap  Condition,  i.e.,  the  expected  state  change  induced  by  the  leap  must  be  sufficiently 
small  that  no  propensity  function  changes  its  value  by  a  significant  amount.  Several  methods  have  been 
proposed  recently  to  choose  the  size  of  the  time  interval  for  tau-leaping  [3-5].  We  implemented  the  tau- 
leaping  method  proposed  by  Cao  et  al.  [6].  In  that  method  a  tau  selection  formula  is  given  by 

[ max{£r;.  /  g(.,l}  maxjac,.  /g(.,l}2 


T  =  mint 

ml, 


|  ftW 


crf(x) 


(5) 


where  g,  is  the  highest  order  of  reaction  in  which  species  5,  appears  as  reactant,  £  is  an  error  control  pa¬ 
rameter,  Irs  is  the  set  of  indices  of  all  reactant  species,  and 


j 

(6) 

&f(x)  =  YJvfiai(x)- 

(7) 
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After  the  time  interval  x'  has  been  selected,  the  number  of  firings  of  each  reaction  channel  during 
this  time  interval  is  approximated  as  a  Poisson  random  variable.  The  Poisson  random  variable  can  have 
arbitrary  large  sample  value  and  it  is  possible  that  the  population  of  some  of  the  molecular  species  can 
run  negative.  Therefore,  a  critical  number  of  molecules  nc  (typically  in  the  range  of  5-20)  was  intro¬ 
duced.  If  the  number  of  molecular  species  gets  less  then  nc,  all  reaction  in  which  that  species  appears  as 
reactant  are  defined  as  critical.  These  reactions  are  simulated  by  the  stochastic  simulation  algorithm.  We 
calculate  the  sum  of  propensity  functions  of  all  the  critical  reactions  a()  and  generate  a  second  candidate 
time  x"  according  to 

1  ,  1 
T  =—  log-. 

a:  r 


(8) 


The  actual  time  leap  x  is  selected  as  the  smaller  of  x'  and  x".  If  x  =  x',  a  number  of  firings  ki  is  generated 
as  a  Poisson  random  variable  with  mean  a;  (x')T  for  all  of  the  noncritical  reaction  and  reactions  are  exe¬ 
cuted;  no  critical  reaction  is  executed  in  this  case.  If  x=x",  we  generate  ki  and  fire  noncritical  reactions  as 
in  the  previous  case;  also,  another  random  number  with  uniform  distribution  is  generated  to  find  which 
critical  reaction  needs  to  be  fired  according  to  Eq.  (1). 

Evaluations  of  the  performance  of  the  tau-leaping  algorithm  shows  a  2-3  fold  speed  up  of  simula¬ 
tions  compared  to  the  exact  stochastic  algorithm  while  maintaining  excellent  accuracy  with  regard  to 
both  rare  and  frequent  events. 

4  BIOMOLECULAR  NETWORK  SIMULATOR 

The  Biomolecular  Network  Simulator  uses  a  combination  of  MATLAB  and  C-coded  modules.  The 
front-end,  graphical  user  interface  (GUI)  and  analysis  tools  of  BNS  are  written  in  MATLAB,  while  the 
simulation  core  engine  is  written  in  C.  Such  a  combination  allows  one  to  use  the  interactive  features  and 
visualization  tools  of  MATLAB,  while  achieving  high  speed  for  the  computationally  intensive  part  of 
the  software.  The  parallelization  of  the  code  is  done  with  the  help  of  the  MPI  library.  The  BNS  can  be 
run  on  any  computer  platform  where  MATLAB  6.5  or  newer  is  installed. 

4.1  Input  Data 

A  model  is  a  set  of  mathematical  relationships  that  describe  the  behavior  of  biochemical  reactions  that 
control  cellular  biological  processes.  Each  of  the  ‘Model’  directories  contains  one  or  more  subdirectories 
with  model  description  files  and/or  different  sets  of  parameters  for  the  same  model  and  an  ‘Output’  di¬ 
rectory  where  the  results  of  simulations  are  stored.  There  are  two  types  of  model  directories:  one  for 
models  defined  in  the  Systems  Biology  Markup  Language  (SBML)  format  [5]  and  one  for  models  de¬ 
fined  as  a  set  of  MATLAB  m-files,  referred  to  as  the  BNS  format.  In  addition,  BNS  allows  one  to  per¬ 
form  simulations  with  multiple  parameter  sets,  with  each  parameter  set  being  run  multiple  times.  Simu¬ 
lations  with  multiple  parameter  sets  can  be  used  for  optimization  and  sensitivity  analysis  of  the  model. 


Figure  1.  Scaling  of  BNS  with  the  number  of  processors. 


4.2  Output  Data 

There  are  two  types  of  output  files:  snapshot  data  and  event  log  data.  Snapshot  data  files  contain  the 
state  of  the  system  (number  of  molecules  of  each  molecular  species)  at  user  specified  time  intervals.  The 
information  stored  in  the  snapshot  files  are  used  to  create  runtime  interactive  graphics  and  for  post  hoc 


analysis  of  the  data.  The  second  type  of  output  files  -  the  event  log  files  -  contain  the  record  of  every 
discrete  event  that  occurs  during  the  simulation.  The  user  should  be  aware  that  event  log  files  may  re¬ 
quire  considerable  memory  or  hard  disk  space  and,  therefore,  may  create  memory  management  prob¬ 
lems  for  simulations  involving  a  large  number  of  long  runs  or  for  large  reaction  networks. 

4.3  Parallelization 


The  parallelization  of  the  BNS  code  is  accomplished  using  the  MPI  library.  In  our  parallelization 
scheme,  the  ‘master’  processor  divides  the  total  number  of  user  specified  runs  between  the  available 
processors,  sends  a  set  of  jobs  to  each  of  the  ‘workers’  and  performs  some  of  the  simulation  runs  itself. 
In  this  approach  to  parallelization,  sometimes  called  “embarrassingly  parallel”  we  reduce  the  communi¬ 
cation  between  the  nodes  and  increase  the  speedup  of  multiple  simulations.  To  test  BNS  scaling  with  the 
number  of  processors,  we  ran  1000  simulations  on  an  SGI  Origin  3900  machine  at  ASC  MSRC,  which 
has  shared  memory  architecture.  A  simulation  here  is  defined  as  a  single  run  of  the  mathematical  model 
of  a  particular  biochemical  reaction  network.  A  92-fold  speedup  was  observed  by  running  BNS  on  100 
processors  (Figure  1). 


4.4  Running  the  Simulations 


The  BNS  can  be  run  either  in  command  line  mode  or  via  a  GUI.  The  GUI  allows  the  user  to  modify 
model  parameters  at  runtime  and  to  execute  simulations  in  the  interactive  or  batch  mode  on  HPC  re¬ 
sources. 


(a) 


(b) 


Figure  2.  The  screen  shots  of  the  BNS  GUI  dialog  windows,  (a)  The  main  BNS  dialog  window,  (b) 
The  parameters  dialog  window  which  allows  users  to  modify  the  model  parameters  and  to  set  simulation 
parameters. 

The  main  dialog  window  of  the  BNS  GUI  is  shown  in  Figure  2.  It  allows  the  user  to  select  the  ap¬ 
propriate  ‘Model’  and  ‘Parameters’  directories  and  set  the  ‘Run’  mode.  A  click  on  the  ‘Details’  button 


next  to  the  ‘Parameters’  directory  opens  a  new  window,  shown  in  Figure  2(b).  This  dialog  window  al¬ 
lows  the  user  to  modify  model  parameters  and  to  set  parameters  for  the  simulation. 

If  simulations  are  run  in  the  interactive  mode,  partial  results  of  the  simulations  will  appear  on  the 
screen  during  the  run.  Usually,  HPC  centers  allocate  limited  resources  (in  number  of  processors  and 
running  time)  for  interactive  simulations,  therefore  BNS  can  be  run  in  ‘Batch’  mode.  In  this  mode  all 
output  data  are  stored  on  the  hard  drive  for  further  analysis. 

4.5  Analysis 

The  Biomolecular  Network  Simulator  has  a  comprehensive  set  of  tools  for  post-simulation  analysis.  A 
GUI  for  the  analysis  tools  allows  the  user  to  easily  select  the  data  and  to  set  conditions  for  the  analysis. 
Multiple  types  of  post-simulation  analyses  are  available. 

4.5.1  Plots  of  number  of  molecules  vs.  time 


The  most  frequently  used  type  of  analysis  is  a  plot  of  the  number  of  molecules  vs.  time.  Such  plots  are 
available  in  the  interactive  mode  or  as  post-simulation  analysis.  There  are  two  ways  to  create  plots:  each 
compound  is  plotted  on  a  separate  figure  or  multiple  compounds  are  plotted  on  the  same  figure  window 
(grouping  mode).  The  number  of  molecules  vs.  time  plots  can  be  created  with  both  types  of  output  files: 
snapshot  data  or  event  log  data. 
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Figure  3.  The  averaged  number  of  compounds  S 1  and  S2  in  the  simulation  interval  (0,  2000)  obtained 
using  the  exact  Gillespie  stochastic  algorithm  and  approximate  tau-leaping  algorithm  (8=0.1,  nc  =5). The 
time-weighted  average  was  calculated  using  a  50-sec  time-averaging  interval  and  the  individual  time- 
weighted  averages  were  averaged  over  the  200  simulation  runs.  Data  for  the  mean  +  SD  are  shown. 

4.5.2  Time-weighted  average  analysis 

A  time-weighted  average  analysis  refers  to  the  calculations  of  the  average  number  of  molecules  of  a  par¬ 
ticular  molecular  species  during  a  user  selected  time-averaging  interval.  The  average  is  weighted  accord¬ 
ing  to  the  amount  of  time  the  compound  exists  in  each  state  during  the  selected  time-averaging  interval. 
The  time  weighted  average  is  then  plotted  versus  time.  The  averaging  analysis  can  be  performed  for  a 


single  run  or  for  a  selected  set  of  runs.  When  the  analysis  is  applied  to  multiple  runs,  the  plot  shows  the 
between  run  average  (the  average  of  each  individual  time-weighted  average)  and  standard  deviation. 

The  graphs  in  Figure  3  show  the  behavior  of  two  molecular  species,  SI  and  S2,  over  the  time  in¬ 
terval  of  2000  seconds  for  a  biomolecular  reaction  network  containing  transcription,  translation  and 
metabolic  reactions.  The  simulations  were  carried  out  using  two  different  algorithms  implemented  in 
BNS:  Gillespie  stochastic  algorithm  and  tau-leaping  algorithm  with  parameters  £=0.1  and  nc=  5.  Figure 
3  shows  the  between  run  average  of  the  time-weighted  average  number  of  molecules  for  200  runs  using 
a  time-averaging  interval  of  50  sec.  The  average  number  of  molecules  of  species  SI  changed  in  the 
range  of  -0-20  molecules,  while  the  average  number  of  molecules  of  species  S2  changed  from  0  to 
-2500  molecules.  Results  obtained  using  an  approximate  tau-leaping  algorithm  are  almost  indistinguish¬ 
able  from  the  result  of  exact  Gillespie  algorithm  for  both  types  of  species.  On  the  other  hand,  using  the 
tau-leaping  algorithm  speeds  up  simulations  more  than  3-fold  compared  to  the  exact  stochastic  algo¬ 
rithm. 

4.5.3  Reaction  frequency  analysis 

Complex  biomolecular  reaction  networks  usually  contain  reactions  that  occur  on  different  time  scales: 
some  reactions  have  a  low  propensity  and  occur  rarely;  other  reactions  are  very  fast  and  occur  fre¬ 
quently.  The  data  stored  in  the  event  log  files  allow  the  user  to  perform  various  reaction  frequency 
analyses  of  the  simulation  data  to  learn  more  about  the  basic  nature  of  the  system.  One  type  of  analysis 
creates  plots  of  the  total  number  of  times  each  reaction  in  the  network  occurred  during  the  simulation. 
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Figure  4.  The  averaged  number  of  reaction  occurrences  per  time-averaging  interval  for  the  reaction 
channels  rxn21  and  rxn24  obtained  using  the  exact  Gillespie  stochastic  algorithm  and  approximate  tau- 
leaping  algorithm  (£=  0.1,  nc=  5).  The  time-averaged  rate  was  calculated  using  a  5  sec  time-averaging 
interval  and  the  individual  time-averaged  rates  were  averaged  over  the  200  simulation  runs.  Data  for  the 
mean  ±  SD  are  shown. 

A  second  type  of  analysis  is  the  average  and  standard  deviation  of  the  time-averaged  reaction  fre¬ 
quency  in  each  reaction  channel.  Figure  4  shows  an  example  of  the  time-averaged  reaction  frequency  for 
two  reaction  channels  averaged  over  the  200  runs  obtained  by  running  simulations  with  the  two  algo¬ 
rithms.  The  reaction  rxn21  belongs  to  the  group  of  “slow”  reactions  with  the  frequency  of  occurrences  in 


the  range  of  0-20  firings  per  sec.  The  reaction  rxn24  is  a  “fast”  reaction  and  reached  500-600  occur¬ 
rences  per  sec.  As  in  the  case  of  average  number  of  molecules,  the  tau-leaping  algorithm  shows  an  ex¬ 
cellent  agreement  with  the  results  from  exact  stochastic  simulations  for  this  particular  biomolecular  re¬ 
action  network. 

5  CONCLUSIONS 

The  Biomolecular  Network  Simulator  allows  the  users  to  simulate  the  behavior  of  complex  biological 
processes  utilizing  the  capacities  of  high  performance  computers.  Some  of  the  features  that  distinguish 
BNS  from  similar  tools  are: 

•  usage  of  MATLAB  and  C-coded  functions  allows  the  user  to  combine  intensive  visualization  of 
data  with  high  speed  computations; 

•  parallelized  code  for  multiple  simultaneous  simulations  allows  the  user  to  run  BNS  on  multi¬ 
processor  machines; 

•  options  to  run  the  code  in  the  interactive  or  batch  mode; 

•  user  friendly  graphical  user  interface  allows  the  user  to  easily  set  and  modify  parameters  of  the 
model,  simulation  and  analysis;  and 

•  comprehensive  tool  sets  provide  for  post-simulation  analysis  of  results. 
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